Goto

Collaborating Authors

 parameterized family


Learning Cut Generating Functions for Integer Programming

Neural Information Processing Systems

The branch-and-cut algorithm is the method of choice to solve large scale integer programming problems in practice. A key ingredient of branch-and-cut is the use of which are derived constraints that reduce the search space for an optimal solution. Selecting effective cutting planes to produce small branch-and-cut trees is a critical challenge in the branch-and-cut algorithm. Recent advances have employed a data-driven approach to select good cutting planes from a parameterized family, aimed at reducing the branch-and-bound tree size (in expectation) for a given distribution of integer programming instances. We extend this idea to the selection of the best cut generating function (CGF), which is a tool in the integer programming literature for generating a wide variety of cutting planes that generalize the well-known Gomory Mixed-Integer (GMI) cutting planes. We provide rigorous sample complexity bounds for the selection of an effective CGF from certain parameterized families that provably performs well for any specified distribution on the problem instances. Our empirical results show that the selected CGF can outperform the GMI cuts for certain distributions. Additionally, we explore the sample complexity of using neural networks for instance-dependent CGF selection.


Quantum Fisher information matrices from Rényi relative entropies

Wilde, Mark M.

arXiv.org Artificial Intelligence

Quantum generalizations of the Fisher information are important in quantum information science, with applications in high energy and condensed matter physics and in quantum estimation theory, machine learning, and optimization. One can derive a quantum generalization of the Fisher information matrix in a natural way as the Hessian matrix arising in a Taylor expansion of a smooth divergence. Such an approach is appealing for quantum information theorists, given the ubiquity of divergences in quantum information theory. In contrast to the classical case, there is not a unique quantum generalization of the Fisher information matrix, similar to how there is not a unique quantum generalization of the relative entropy or the Rényi relative entropy. In this paper, I derive information matrices arising from the log-Euclidean, $α$-$z$, and geometric Rényi relative entropies, with the main technical tool for doing so being the method of divided differences for calculating matrix derivatives. Interestingly, for all non-negative values of the Rényi parameter $α$, the log-Euclidean Rényi relative entropy leads to the Kubo-Mori information matrix, and the geometric Rényi relative entropy leads to the right-logarithmic derivative Fisher information matrix. Thus, the resulting information matrices obey the data-processing inequality for all non-negative values of the Rényi parameter $α$ even though the original quantities do not. Additionally, I derive and establish basic properties of $α$-$z$ information matrices resulting from the $α$-$z$ Rényi relative entropies. For parameterized thermal states and time-evolved states, I establish formulas for their $α$-$z$ information matrices and hybrid quantum-classical algorithms for estimating them, with applications in quantum Boltzmann machine learning.


Learning Cut Generating Functions for Integer Programming

Neural Information Processing Systems

The branch-and-cut algorithm is the method of choice to solve large scale integer programming problems in practice. A key ingredient of branch-and-cut is the use of cutting planes which are derived constraints that reduce the search space for an optimal solution. Selecting effective cutting planes to produce small branch-and-cut trees is a critical challenge in the branch-and-cut algorithm. Recent advances have employed a data-driven approach to select good cutting planes from a parameterized family, aimed at reducing the branch-and-bound tree size (in expectation) for a given distribution of integer programming instances. We extend this idea to the selection of the best cut generating function (CGF), which is a tool in the integer programming literature for generating a wide variety of cutting planes that generalize the well-known Gomory Mixed-Integer (GMI) cutting planes.


On Functional Dimension and Persistent Pseudodimension

Grigsby, J. Elisenda, Lindsey, Kathryn

arXiv.org Artificial Intelligence

For any fixed feedforward ReLU neural network architecture, it is well-known that many different parameter settings can determine the same function. It is less well-known that the degree of this redundancy is inhomogeneous across parameter space. In this work, we discuss two locally applicable complexity measures for ReLU network classes and what we know about the relationship between them: (1) the local functional dimension [14, 18], and (2) a local version of VC dimension that we call persistent pseudodimension. The former is easy to compute on finite batches of points; the latter should give local bounds on the generalization gap, which would inform an understanding of the mechanics of the double descent phenomenon [7].


Natural gradient and parameter estimation for quantum Boltzmann machines

Patel, Dhrumil, Wilde, Mark M.

arXiv.org Artificial Intelligence

Thermal states play a fundamental role in various areas of physics, and they are becoming increasingly important in quantum information science, with applications related to semi-definite programming, quantum Boltzmann machine learning, Hamiltonian learning, and the related task of estimating the parameters of a Hamiltonian. Here we establish formulas underlying the basic geometry of parameterized thermal states, and we delineate quantum algorithms for estimating the values of these formulas. More specifically, we prove formulas for the Fisher--Bures and Kubo--Mori information matrices of parameterized thermal states, and our quantum algorithms for estimating their matrix elements involve a combination of classical sampling, Hamiltonian simulation, and the Hadamard test. These results have applications in developing a natural gradient descent algorithm for quantum Boltzmann machine learning, which takes into account the geometry of thermal states, and in establishing fundamental limitations on the ability to estimate the parameters of a Hamiltonian, when given access to thermal-state samples. For the latter task, and for the special case of estimating a single parameter, we sketch an algorithm that realizes a measurement that is asymptotically optimal for the estimation task. We finally stress that the natural gradient descent algorithm developed here can be used for any machine learning problem that employs the quantum Boltzmann machine ansatz.


Frank's triangular norms in Piaget's logical proportions

Prade, Henri, Richard, Gilles

arXiv.org Artificial Intelligence

Starting from the Boolean notion of logical proportion in Piaget's sense, which turns out to be equivalent to analogical proportion, this note proposes a definition of analogical proportion between numerical values based on triangular norms (and dual co-norms). Frank's family of triangular norms is particularly interesting from this perspective. The article concludes with a comparative discussion with another very recent proposal for defining analogical proportions between numerical values based on the family of generalized means.


Tradeoffs between convergence rate and noise amplification for momentum-based accelerated optimization algorithms

Mohammadi, Hesameddin, Razaviyayn, Meisam, Jovanović, Mihailo R.

arXiv.org Artificial Intelligence

We study momentum-based first-order optimization algorithms in which the iterations utilize information from the two previous steps and are subject to an additive white noise. This class of algorithms includes Polyak's heavy-ball and Nesterov's accelerated methods as special cases and noise accounts for uncertainty in either gradient evaluation or iteration updates. For strongly convex quadratic problems, we use the steady-state variance of the error in the optimization variable to quantify noise amplification and identify fundamental stochastic performance tradeoffs. Our approach utilizes the Jury stability criterion to provide a novel geometric characterization of conditions for linear convergence, and it clarifies the relation between the noise amplification and convergence rate as well as their dependence on the condition number and the constant algorithmic parameters. This geometric insight leads to simple alternative proofs of standard convergence results and allows us to establish analytical lower bounds on the product between the settling time and noise amplification that scale quadratically with the condition number. Our analysis also identifies a key difference between the gradient and iterate noise models: while the amplification of gradient noise can be made arbitrarily small by sufficiently decelerating the algorithm, the best achievable variance amplification for the iterate noise model increases linearly with the settling time in decelerating regime. Furthermore, we introduce two parameterized families of algorithms that strike a balance between noise amplification and settling time while preserving order-wise Pareto optimality for both noise models. Finally, by analyzing a class of accelerated gradient flow dynamics, whose suitable discretization yields the two-step momentum algorithm, we establish that stochastic performance tradeoffs also extend to continuous time.


Actor-Critic Algorithms

Konda, Vijay R., Tsitsiklis, John N.

Neural Information Processing Systems

We propose and analyze a class of actor-critic algorithms for simulation-based optimization of a Markov decision process over a parameterized family of randomized stationary policies. These are two-time-scale algorithms in which the critic uses TD learning with a linear approximation architecture and the actor is updated in an approximate gradient direction based on information provided bythe critic. We show that the features for the critic should span a subspace prescribed by the choice of parameterization of the actor. We conclude by discussing convergence properties and some open problems.


Family Discovery

Omohundro, Stephen M.

Neural Information Processing Systems

"Family discovery" is the task of learning the dimension and structure ofa parameterized family of stochastic models. It is especially appropriatewhen the training examples are partitioned into "episodes" of samples drawn from a single parameter value. We present three family discovery algorithms based on surface learning andshow that they significantly improve performance over two alternatives on a parameterized classification task. 1 INTRODUCTION Human listeners improve their ability to recognize speech by identifying the accent of the speaker. "Might" in an American accent is similar to "mate" in an Australian accent. By first identifying the accent, discrimination between these two words is improved.


Family Discovery

Omohundro, Stephen M.

Neural Information Processing Systems

"Family discovery" is the task of learning the dimension and structure of a parameterized family of stochastic models. It is especially appropriate when the training examples are partitioned into "episodes" of samples drawn from a single parameter value. We present three family discovery algorithms based on surface learning and show that they significantly improve performance over two alternatives on a parameterized classification task.